Algorithmic frequency/pitch detection series - Part 1: Making a simple pitch tracker using Zero Crossing Rate

Published in

The Seeker’s Project

5 min readMay 17, 2020

Algorithmic frequency detection is an area of research under Music Information Retrieval with some really cool applications.

Frequency detection is used in a lot of apps that musicians use — one such being instrument tuners. Tuning a stringed instrument has become very easy these days with the handy mobile apps that we have on our phones, which allow us to tune our instruments within minutes with great accuracy.

Under the hood of the GUI that show us the tuning, are some really intelligent algorithms. The basic workings of such applications can be simplified into a flowchart.

In this series, I will attempt to go deeper into several algorithms for frequency detection; starting from the very basic making my way to more complex algorithms with each post.

Machine Learning and statistical models have also been widely used for frequency/pitch detection. But for the sake of simplicity, I will keep this series limited to just algorithmic frequency detection.

What is Zero Crossing Rate?

Zero Crossing Rate (ZCR) is a time-domain audio analysis tool that helps us calculate the frequency of monophonic audio. It is a very basic algorithm and I talk a bit about its performance in a section below.

Monophonic basically means that there is only one note playing at any given time. ZCR performs very poorly for polyphonic (multiple simultaneous notes playing) audio and is better suited for monophonic audio.

Before we get into the algorithm, let’s brush up on how audio works. This will help us understand the intuition behind the ZCR algorithm.

1. Basics of audio

As we know, audio signals are longitudinal pressure waves that our ears interpret as sound.

Objects that vibrate regularly and periodically generate sounds which we interpret as tones or notes. Objects that vibrate in a non-regular or random fashion generate atonal or noisy sounds. — Jack Schaedler ¹

The parameter for measuring audio signals is called Frequency — the number of times a wave repeats itself per second.

When you pluck the the low E string on your guitar with a standard tuning, it vibrates at 82.41 Hz.

2. Intuition behind ZCR

A digitized sound wave can be viewed as a continuous stream of points in two dimensional space — of time and amplitude, fluctuating around a silent threshold with a repeating pattern and shape ². If we look at a signal through an oscilloscope, its frequency can be visually determined by looking at when the cycle repeats itself, if the sampling frequency of the signal is known.

We use this intuition in the ZCR algorithm. The below diagram shows an oscillating signal. The dotted lines show the points where this signal crosses the silent threshold — the zero line.

For each buffer of our stream of input audio, if we calculate the number of times the signal crosses the silent threshold — we can calculate its frequency!

3. Laying down the Math

Let’s look at the math for this to make the concept clearer —

Step 1: Traversing through the samples, at each point, if the amplitude of the sample (N) is equal to the silent threshold, and the amplitude of the sample after it (N + 1) is more than or less than the silent threshold, then the sample point at (N) is a zero crossing point.

Step 2: Once the number of zero crossing points is known, the sampling rate of the signal and the total number of samples can be used to determine the number of seconds over which the zero crossing points were calculated as follows:

Number of seconds over which zero crossing points were calculated

Step 3: For each oscillation of a signal, the signal crosses the silent threshold twice. So the number of oscillations will be given by:

Number of oscillations of the input audio signal

Step 4: The frequency of the signal can be calculated from the values in step 2 and 3 as follows:

Frequency of input audio signal in Hertz

For the curious souls, here is an amazing resource that explains ZCR in a Jupyter notebook — https://musicinformationretrieval.com/zcr.html

Implementation

The implementation of the ZCR algorithm is straightforward. Below is an implementation I did in Java —

public class ZeroCrossing
{
    private static final String TAG = "ZeroCrossing.java";    /*
     * calculate frequency using zero crossings
     */    public static int calculate(int sampleRate, short [] audioData)
    {
     int numSamples = audioData.length;
     int numCrossing = 0;
     for (int p = 0; p < numSamples-1; p++)
        {
            if ((audioData[p] > 0 && audioData[p + 1] <= 0) ||
                    (audioData[p] < 0 && audioData[p + 1] >= 0))
            {
                numCrossing++;
            }
        }     float numSecondsRecorded = (float)numSamples/(float)sampleRate;
     float numCycles = numCrossing/2;
     float frequency = numCycles/numSecondsRecorded;    return (int)frequency;
    }
  }

The complete implementation as an Android app can be accessed here: https://github.com/Rishikeshdaoo/PitchDetector

A great implementation in Python is here.

Testing and Results

The zero crossing rate algorithm works very well for pure tone sounds. As seen in Table 1, the percentage error in detection of a sine tone is extremely low. This indicates that the algorithm is successfully detecting the pitch for pure tones. The algorithm works poorly for musical instruments. The pitch value rapidly fluctuates around the higher harmonics of the sound. This
indicates that the algorithm is not good at handling sounds having harmonics.

Table 1: Test results of the ZCR algorithm

I hope this post helps out at least a few. Please let me know if there are any inaccuracies in the explanation or any improvements I could make. Also, please write to me if you have any topic I could write on or projects you would like to work on together.

—

Rishi